Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adapter: Create deterministic log index IDs #30603

Merged
merged 12 commits into from
Nov 27, 2024

Conversation

jkosh44
Copy link
Contributor

@jkosh44 jkosh44 commented Nov 22, 2024

This commit adds a new variant of GlobalId and CatalogItemId for
introspection source indexes. The values of these IDs are
deterministically derived from the cluster ID and the log variant.

Introspection source indexes are a special edge case of items. They are
considered system items, but they are the only system item that can be
created by the user at any time. All other system items can only be
created by the system during the startup of an upgrade.

Previously, it was possible to allocate the same System ID to two
different objects if something like the following happened:

  1. Materialize version v is running in read-write mode.
  2. Materialize version v + 1 starts in read-only mode.
  3. The next system item ID is s.
  4. v + 1 allocates s for a new system item (table, view,
    introspection source, etc.)
  5. v creates a new user cluster and allocates s through s + n to
    the introspection source indexes in that cluster. At this point we
    have two separate objects with the same Global ID, which is bad.
  6. v + 1 reboots in read-write mode and allocates s + n + 1 to the
    new system item. At this point the new system item has received two
    different IDs, which is also bad.

Putting introspection source index IDs in their own namespace and
making them deterministic removes this issue and ones like it.

Fixes #MaterializeInc/database-issues/issues/8731

Motivation

This PR fixes a recognized bug.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@jkosh44 jkosh44 force-pushed the introspection-source-index-ids branch 5 times, most recently from 496dce0 to 9241cb6 Compare November 25, 2024 16:40
@jkosh44 jkosh44 force-pushed the introspection-source-index-ids branch from 9241cb6 to 80902a7 Compare November 25, 2024 17:11
@jkosh44 jkosh44 changed the title WIP introspection IDs adapter: Create deterministic log index IDs Nov 25, 2024
@jkosh44 jkosh44 added the T-proto Theme: `$T ⇔ Proto$T` conversions and `*.proto` files label Nov 25, 2024
@jkosh44 jkosh44 force-pushed the introspection-source-index-ids branch 4 times, most recently from a55213c to e7a6c1b Compare November 25, 2024 19:53
@jkosh44
Copy link
Contributor Author

jkosh44 commented Nov 25, 2024

I have a follow-up PR rebased on top of this PR that adds the new introspection sources back: #30620

@jkosh44 jkosh44 force-pushed the introspection-source-index-ids branch from e7a6c1b to 5f3738b Compare November 25, 2024 19:57
This commit adds a new variant of `GlobalId` and `CatalogItemId` for
introspection source indexes. The values of these IDs are
deterministically derived from the cluster ID and the log variant.

Introspection source indexes are a special edge case of items. They are
considered system items, but they are the only system item that can be
created by the user at any time. All other system items can only be
created by the system during the startup of an upgrade.

Previously, it was possible to allocate the same System ID to two
different objects if something like the following happened:

1. Materialize version `v` is running in read-write mode.
2. Materialize version `v + 1` starts in read-only mode.
3. The next system item ID is `s`.
4. `v + 1` allocates `s` for a new system item (table, view,
   introspection source, etc.)
5. `v` creates a new user cluster and allocates `s` through `s + n` to
   the introspection source indexes in that cluster. At this point we
   have two separate objects with the same Global ID, which is bad.
6. `v + 1` reboots in read-write mode and allocates `s + n + 1` to the
   new system item. At this point the new system item has received two
   different IDs, which is also bad.

Putting introspection source index IDs in their own namespace and
making them deterministic removes this issue and ones like it.

Fixes #MaterializeInc/database-issues/issues/8731
@jkosh44 jkosh44 force-pushed the introspection-source-index-ids branch from 5f3738b to 868e335 Compare November 25, 2024 20:27
@jkosh44 jkosh44 marked this pull request as ready for review November 25, 2024 22:12
@jkosh44 jkosh44 requested review from a team as code owners November 25, 2024 22:12
Copy link

shepherdlybot bot commented Nov 25, 2024

Risk Score:83 / 100 Bug Hotspots:3 Resilience Coverage:50%

Mitigations

Completing required mitigations increases Resilience Coverage.

  • (Required) Code Review 🔍 Detected
  • (Required) Feature Flag
  • (Required) Integration Test 🔍 Detected
  • (Required) Observability 🔍 Detected
  • (Required) QA Review
  • (Required) Run Nightly Tests
  • Unit Test
Risk Summary:

The pull request has a high-risk score of 83, driven by predictors such as the average line count in files and executable lines within files. Historically, PRs with these predictors are 154% more likely to cause a bug than the repository baseline. Additionally, the repository's observed and predicted bug trends are both decreasing.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File Percentile
../src/coord.rs 100
../src/catalog.rs 99
../catalog/open.rs 99

src/repr/src/catalog_item_id.proto Outdated Show resolved Hide resolved
@@ -19,6 +19,7 @@ message GlobalId {
uint64 user = 2;
uint64 transient = 3;
google.protobuf.Empty explain = 4;
uint64 introspection_source_index = 5;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooc: Do you know why we have two protobuf definitions for GlobalId?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually have 3, but no I don't.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have one in the catalog protobuf objects so types we write for the catalog are isolated from the rest of the system, then another in repr for the rest of the system. No idea why we have this one in storage-types though, it can probably be deleted?

/// A user storage instance.
User(u64),
User(u32),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is, I think, the scariest change made here. There is some risk that a user creates and drops a cluster in a loop 4 billion times, bricking the system. Given that we only need this for the GlobalId::IntrospectionSourceIndex generation, and we do manual packing there, and we still have 8 bits left in the representation, could we make this a u40 instead? Or even a u48, if we make represent the log variant with 8 bits instead of 16.

There is no u48 type of course, so we'd need to continue to using u64 and assert that the high bits are zero. Or do something more exotic like using a third-party crate or storing [u8; 6], but I'm not sure that's worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the u40/u48 idea. I'll try and whip that up. You would probably know the answer to this better than me, is 255 enough for all the introspection sources we ever might have?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if we see people start to get close to u40/u48, we could either:

  • Go back to a u64 and increase the size of GlobalId.
  • Convert GlobalId to a u128 and manually pack it ourselves. That's more than enough space to store all the information we need, even with a Cluster: u64.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 255 enough for all the introspection sources we ever might have?

Currently we have ~30, and every source has some maintenance and implementation overhead, so at least with the current compute logging system I doubt that we'd ever get more than 255 sources. If I'm wrong we still have the option to steal a couple bit from the 8 bits we reserve for the GlobalId variant, so that wouldn't be the end of the world.

Also if we see people start to get close to u40/u48, we could either:

Agreed! The challenge might be noticing that people get close :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! The challenge might be noticing that people get close :)

I'll add an error log when someone fills up 47 bits so we are alerted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I just increased this to 48 bits.

Copy link
Contributor

@teskje teskje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look closely at the adapter parts, but this lgtm. I have a preference for having the string representation of these IDs start with an "s", so that you can still use a NOT LIKE 's%' query to exclude system objects, but there might be reasons why that wouldn't work.

src/adapter/src/catalog/transact.rs Outdated Show resolved Hide resolved
src/catalog/src/durable/objects.rs Show resolved Hide resolved
src/repr/src/global_id.rs Show resolved Hide resolved
@@ -92,6 +99,7 @@ impl FromStr for GlobalId {
let val: u64 = s[1..].parse()?;
match s.chars().next().unwrap() {
's' => Ok(GlobalId::System(val)),
'i' => Ok(GlobalId::IntrospectionSourceIndex(val)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this prefix si, to make it clear that this is still a system ID? Perhaps a question to bring before the SQL council, if it wasn't already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a commit to switch this to si.

Copy link
Member

@ParkMyCar ParkMyCar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Will follow up in SQL council about the string repr of the ids

src/catalog/protos/objects.proto Outdated Show resolved Hide resolved
/// to.
/// Cluster ID Inner Value: A per variant unique number indicating the cluster the index
/// belongs to.
/// Log Variant: A unique number indicating the log variant this index is on.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has probably been discussed at length already so no need to change, but it's a bit surprising to me that we're only leaving 8 bits for the log variant especially when we already have 31 variants. This feels like something we could realistically run up against in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily at length, but we have discussed it: #30603 (comment).

In general if we need to increase the size of any of these fields in the future, then we can do a similar migration where we change the representation of GlobalId and re-assign every ID to introspection source indexes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we wouldn't even need a migration, just changing the packing code to spill the log variant into free bits of the cluster ID variant should be enough. But yeah the important thing is that the number of log variants is under our control, in contrast to the number of clusters, so I'm much less worried about the former.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh, both comments make sense to me and alleviate all concerns, thanks!

src/repr/src/catalog_item_id.proto Outdated Show resolved Hide resolved
@@ -19,6 +19,7 @@ message GlobalId {
uint64 user = 2;
uint64 transient = 3;
google.protobuf.Empty explain = 4;
uint64 introspection_source_index = 5;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have one in the catalog protobuf objects so types we write for the catalog are isolated from the rest of the system, then another in repr for the rest of the system. No idea why we have this one in storage-types though, it can probably be deleted?

const WARN_MASK: u64 = 1 << 47;
if MASK & id == 0 {
if WARN_MASK & id != 0 {
error!("{WARN_MASK} or more `StorageInstanceId`s allocated, we will run out soon");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make this a soft assert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a soft assert gives us anything additional. There's nothing incorrect about this being true and it's not something we want to avoid in tests. We just want to be warned if it happens in prod so we can start coming up with a plan to increase the width of cluster IDs.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Nov 27, 2024

@ParkMyCar will you take another look at the commits since you're last review? I haven't force pushed so you should be able to review those in isolation if you want.

@ParkMyCar
Copy link
Member

@jkosh44 looking now!

Copy link
Member

@ParkMyCar ParkMyCar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except for the comment on parsing

let variant = match tag {
's' => {
if Some('i') == s.chars().next() {
s = &s[1..];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be:

Suggested change
s = &s[1..];
s = &s[2..];

Since the first two bytes are si?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this is correct as is. We've already consumed the first byte above on line 75 s = &s[1..];. We just need to consume an extra byte for this variant because it has an extra letter compared to the rest.

_ => return Err(anyhow!("couldn't parse id {}", s)),
};
let val: u64 = s.parse()?;
Ok(variant(val))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woah, I had no idea you could construct enums like this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding that all enum variants are actually just functions under the hood. So the type of variant is actually fn(u64) -> CatalogItemId. Then variant(val) is a normal function call that returns a CatalogItemId.

let variant = match tag {
's' => {
if Some('i') == s.chars().next() {
s = &s[1..];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@jkosh44 jkosh44 merged commit 14cc787 into MaterializeInc:main Nov 27, 2024
204 of 219 checks passed
@jkosh44 jkosh44 deleted the introspection-source-index-ids branch November 27, 2024 19:38
jkosh44 added a commit to jkosh44/materialize that referenced this pull request Nov 27, 2024
This commit adds a new variant of `GlobalId` and `CatalogItemId` for
introspection source indexes. The values of these IDs are
deterministically derived from the cluster ID and the log variant.

Introspection source indexes are a special edge case of items. They are
considered system items, but they are the only system item that can be
created by the user at any time. All other system items can only be
created by the system during the startup of an upgrade.

Previously, it was possible to allocate the same System ID to two
different objects if something like the following happened:

1. Materialize version `v` is running in read-write mode.
2. Materialize version `v + 1` starts in read-only mode.
3. The next system item ID is `s`.
4. `v + 1` allocates `s` for a new system item (table, view,
   introspection source, etc.)
5. `v` creates a new user cluster and allocates `s` through `s + n` to
   the introspection source indexes in that cluster. At this point we
   have two separate objects with the same Global ID, which is bad.
6. `v + 1` reboots in read-write mode and allocates `s + n + 1` to the
   new system item. At this point the new system item has received two
   different IDs, which is also bad.

Putting introspection source index IDs in their own namespace and
making them deterministic removes this issue and ones like it.

Fixes #MaterializeInc/database-issues/issues/8731
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-proto Theme: `$T ⇔ Proto$T` conversions and `*.proto` files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants